Google’s Gemini AI model to train robots navigate, understand world
The potential benefits of robots include helping the elderly and improving efficiency of work places.
image for illustrative purpose
Google's DeepMind robotics team has made major progress by using the Gemini 1.5 Pro's long context window to train robots for navigation and task completion.
This capability allows the AI model to handle more information, allowing the robots to better understand and remember their environments.
DeepMind's approach involves having robots "watch" video tours of locations, much like a human would. The long context window lets the AI process extensive information in one go, enhancing how robots learn and interact with their surroundings.
The process is as follows:
Researchers film a tour of the office and show it to the robot .
The robot then learns the layout of the office, including objects and other features of the space
When a user gives a command, the robot makes use of its memory to navigate.
However, the tests have been conducted only in controlled environments.
For example, if you show the robot a plastic bin and ask, "Where should I return this?" it can accompany you to the shelf to return the box based on the video.
According to DeepMind, these robots have been tested in a 9,000-square-foot area and successfully followed over 50 different instructions 90% of the time, showing a significant improvement in navigating complex spaces.
The potential benefits of these robots include helping the elderly and improving the efficiency of workplaces.
For instance, if a user asks whether a fruit is available in the refrigerator, the robot can move to the refrigerator, check for the availability of fruit, and return and communicate its findings.
At present, the system is taking 10 to 30 seconds to process each instruction, which is considered too slow when it comes to practical usage.
At present, the DeepMind team is working on making the system faster and capable of handling more complex tasks. DeepMind also said that as technology improves, these robots can navigate and understand the world, as humans do.